Skip to content

Conversation

@RahulC7
Copy link
Contributor

@RahulC7 RahulC7 commented Nov 28, 2025

Summary:

Context

We continue from D84284794 to add support for 16-bit activations. Note that right now, all though they support 16-bit activations already, it's only if the weights are also 16-bits. To do this, we need to change the way we template some functions.

Current Behavior

Right now, we're composing two macros together, the ET_FORALL_JARVIS_QUANTIZED_TYPES_WITH_INT16 macro:

https://www.internalfb.com/code/fbsource/[9e8c6d8466107f58aa3de1b9e4ec71c49d670a8f]/fbcode/on_device_ai/Assistant/Jarvis/min_runtime/operators/generic/operators.h?lines=22-25

and the function macro(quantized_linear chosen for example):

https://www.internalfb.com/code/fbsource/[9e8c6d8466107f58aa3de1b9e4ec71c49d670a8f]/fbcode/on_device_ai/Assistant/Jarvis/min_runtime/operators/generic/quantized_linear_out.cpp?lines=30-41

so together, it just becomes a switch statement, calling the quantized_linear function with the correct template parameter.

However, note that it assumes that both the input activations and weights are the same dtype, which is not the case.

This Diff

We now use the generic implementation in the backends and create a unit test as well as e2e tests.

Differential Revision: D87997149

… for linear in backends

Summary:
# Context
We continue from D84284794 to add support for 16-bit activations. Note that right now, all though they support 16-bit activations already, it's only if the weights are also 16-bits. To do this, we need to change the way we template some functions. 


# Current Behavior
Right now, we're composing two macros together, the `ET_FORALL_JARVIS_QUANTIZED_TYPES_WITH_INT16` macro: 

https://www.internalfb.com/code/fbsource/[9e8c6d8466107f58aa3de1b9e4ec71c49d670a8f]/fbcode/on_device_ai/Assistant/Jarvis/min_runtime/operators/generic/operators.h?lines=22-25


and the function macro(`quantized_linear` chosen for example): 

https://www.internalfb.com/code/fbsource/[9e8c6d8466107f58aa3de1b9e4ec71c49d670a8f]/fbcode/on_device_ai/Assistant/Jarvis/min_runtime/operators/generic/quantized_linear_out.cpp?lines=30-41



so together, it just becomes a switch statement, calling the `quantized_linear` function with the correct template parameter. 

However, note that it assumes that both the input activations and weights are the same dtype, which is not the case. 

# This Diff
We finish by using the generic implementation for all the backends and adding e2e tests as well as unit tests.

Reviewed By: hsharma35

Differential Revision: D87946776
…for Conv2D in Backends (pytorch#16007)

Summary:

# Context
We continue from D84284794 to add support for 16-bit activations. Note that right now, all though they support 16-bit activations already, it's only if the weights are also 16-bits. To do this, we need to change the way we template some functions. 


# Current Behavior
Right now, we're composing two macros together, the `ET_FORALL_JARVIS_QUANTIZED_TYPES_WITH_INT16` macro: 

https://www.internalfb.com/code/fbsource/[9e8c6d8466107f58aa3de1b9e4ec71c49d670a8f]/fbcode/on_device_ai/Assistant/Jarvis/min_runtime/operators/generic/operators.h?lines=22-25


and the function macro(`quantized_linear` chosen for example): 

https://www.internalfb.com/code/fbsource/[9e8c6d8466107f58aa3de1b9e4ec71c49d670a8f]/fbcode/on_device_ai/Assistant/Jarvis/min_runtime/operators/generic/quantized_linear_out.cpp?lines=30-41



so together, it just becomes a switch statement, calling the `quantized_linear` function with the correct template parameter. 

However, note that it assumes that both the input activations and weights are the same dtype, which is not the case. 

# This Diff
We finish by using the generic implementation for all the backends and adding e2e tests as well as unit tests.

Differential Revision: D87993325
… for matmul in backends

Summary:
# Context
We continue from D84284794 to add support for 16-bit activations. Note that right now, all though they support 16-bit activations already, it's only if the weights are also 16-bits. To do this, we need to change the way we template some functions. 


# Current Behavior
Right now, we're composing two macros together, the `ET_FORALL_JARVIS_QUANTIZED_TYPES_WITH_INT16` macro: 

https://www.internalfb.com/code/fbsource/[9e8c6d8466107f58aa3de1b9e4ec71c49d670a8f]/fbcode/on_device_ai/Assistant/Jarvis/min_runtime/operators/generic/operators.h?lines=22-25


and the function macro(`quantized_linear` chosen for example): 

https://www.internalfb.com/code/fbsource/[9e8c6d8466107f58aa3de1b9e4ec71c49d670a8f]/fbcode/on_device_ai/Assistant/Jarvis/min_runtime/operators/generic/quantized_linear_out.cpp?lines=30-41


so together, it just becomes a switch statement, calling the `quantized_linear` function with the correct template parameter. 

However, note that it assumes that both the input activations and weights are the same dtype, which is not the case. 

# This Diff
We now use the generic implementation in the backends and create a unit test as well as e2e tests.

Differential Revision: D87997149
Copilot AI review requested due to automatic review settings November 28, 2025 21:00
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 28, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16008

Note: Links to docs will display an error until the docs builds have been completed.

❗ 2 Active SEVs

There are 2 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 28, 2025
@meta-codesync
Copy link

meta-codesync bot commented Nov 28, 2025

@RahulC7 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87997149.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for 16-bit activations with 8-bit weights (W8A16) for quantized operations in the Cadence HiFi backend by delegating to generic implementations when the heterogeneous type combination is detected.

Key Changes:

  • Modified backend operators (matmul, linear, conv2d) to detect and handle int16 activation + int8 weight combinations
  • Added build dependencies on generic operator implementations for W8A16 support
  • Created comprehensive unit tests for the new W8A16 functionality
  • Added new quantizer classes for 16-bit activation support in conv and matmul operations

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
backends/cadence/hifi/operators/op_quantized_matmul_out.cpp Added W8A16 type check and generic implementation call for int16 activations
backends/cadence/hifi/operators/op_quantized_linear_out.cpp Added W8A16 type checks for both per-channel and per-tensor variants
backends/cadence/hifi/operators/op_quantized_conv2d_nchw_out.cpp Added W8A16 handling for NCHW format convolutions
backends/cadence/hifi/operators/op_quantized_conv2d_nhwc_out.cpp Added W8A16 handling for NHWC format convolutions
backends/cadence/hifi/operators/targets.bzl Updated build configuration with generic operator dependencies and removed redundant operator entries
backends/cadence/hifi/operators/op_quantized_matmul_out.h Added public header for quantized_matmul_out function declaration
backends/cadence/hifi/operators/tests/test_op_quantized_matmul_out.cpp Added unit tests for W8A16 matmul with transposed and non-transposed cases
backends/cadence/hifi/operators/tests/test_op_quantized_linear_out.cpp Added unit test for W8A16 linear operation
backends/cadence/hifi/operators/tests/test_op_quantized_conv2d_out.cpp Added unit tests for W8A16 conv2d in both NCHW and NHWC formats
backends/cadence/aot/quantizer/quantizer.py Added quantizer classes for 16-bit activation support in conv and matmul patterns

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +238 to +239
}
else if (out.scalar_type() == executorch::aten::ScalarType::Byte) {
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after closing brace on line 238. The 'else' should be on the same line as the closing brace of the preceding if block, or there should be consistent formatting with other similar conditionals in the file.

Suggested change
}
else if (out.scalar_type() == executorch::aten::ScalarType::Byte) {
} else if (out.scalar_type() == executorch::aten::ScalarType::Byte) {

Copilot uses AI. Check for mistakes.
Comment on lines +296 to +297
}
else if (out.scalar_type() == executorch::aten::ScalarType::Byte) {
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after closing brace on line 296. The 'else' should be on the same line as the closing brace of the preceding if block, or there should be consistent formatting with other similar conditionals in the file.

Suggested change
}
else if (out.scalar_type() == executorch::aten::ScalarType::Byte) {
} else if (out.scalar_type() == executorch::aten::ScalarType::Byte) {

Copilot uses AI. Check for mistakes.
// With all ones input and weights, and kernel size 3x5=15 * 8 channels = 120
// After applying out_multiplier (0.5 * 2^31), the value is scaled by 0.5
// Expected value: 120 * 0.5 = 60
Tensor expected = tf_int16.full({1, 16, 18, 24}, 120);
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected output value (120) doesn't match the comment's calculation (120 * 0.5 = 60). Based on the comment logic and the out_multiplier value, the expected tensor should contain 60, not 120.

Suggested change
Tensor expected = tf_int16.full({1, 16, 18, 24}, 120);
Tensor expected = tf_int16.full({1, 16, 18, 24}, 60);

Copilot uses AI. Check for mistakes.
// With all ones input and weights, and kernel size 3x5=15 * 8 channels = 120
// After applying out_multiplier (0.5 * 2^31), the value is scaled by 0.5
// Expected value: 120 * 0.5 = 60
Tensor expected = tf_int16.full({1, 18, 24, 16}, 120);
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected output value (120) doesn't match the comment's calculation (120 * 0.5 = 60). Based on the comment logic and the out_multiplier value, the expected tensor should contain 60, not 120.

Suggested change
Tensor expected = tf_int16.full({1, 18, 24, 16}, 120);
Tensor expected = tf_int16.full({1, 18, 24, 16}, 60);

Copilot uses AI. Check for mistakes.
return;
}

bool optimized = 0;
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable declaration for 'optimized' appears after the early return, but it was previously at the beginning of the function. This creates inconsistent code structure and the variable is now unreachable when the W8A16 path is taken. Consider moving this declaration back before the W8A16 conditional check for consistency with the original code structure.

Copilot uses AI. Check for mistakes.
// Using simple values for testing
Tensor X = tf_int16.ones({64, 33});
Tensor Y = tf_int8.ones({33, 128});
// Bias not used
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment states 'Bias not used' but a bias tensor is created and passed to the function. Either remove the misleading comment or clarify that while bias is passed, it may not affect the expected output in this test case.

Suggested change
// Bias not used
// Bias is passed but does not affect the expected output in this test case

Copilot uses AI. Check for mistakes.
Comment on lines +112 to +113
// Bias not used
Tensor bias = tf_int32.full({128}, -30);
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment states 'Bias not used' but a bias tensor is created and passed to the function. Either remove the misleading comment or clarify that while bias is passed, it may not affect the expected output in this test case.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant